Skip to content

Add analyte-trend, MCL-exceedance, and monitoring-recency products#90

Merged
jirhiker merged 6 commits into
feature/st2-source-datastream-linkfrom
feature/trend-mcl-recency-products
Jun 28, 2026
Merged

Add analyte-trend, MCL-exceedance, and monitoring-recency products#90
jirhiker merged 6 commits into
feature/st2-source-datastream-linkfrom
feature/trend-mcl-recency-products

Conversation

@jirhiker

Copy link
Copy Markdown
Member

Stacked on #89 (uses the generalized trend engine + source_datastream_link). Base will retarget to main once #89 merges.

What

Three new data products on the per-source asset graph:

  1. ogc_analyte_trend — per-well analyte concentration trend (Mann-Kendall + Theil-Sen, daily mean). One product per analyte; seeds nm_arsenic_trend + nm_nitrate_trend.
  2. ogc_mcl_exceedance (nm_mcl_exceedance) — one feature per well flagging drinking-water MCL exceedances. Thresholds read at run time from gs://<products_bucket>/config/mcl.json (source of truth). Per analyte: value, _mcl, _mcl_type, _exceeds; plus any_exceedance, exceedance_count, exceeded_analytes.
  3. ogc_monitoring_recency (nm_monitoring_recency) — one feature per well: last_observation_datetime, days_since_last, status (active/stale at stale_days, default 365). Surfaces dead/lagging monitoring points.

Implementation

  • Generalized the trend engine: dump_waterlevel_trend_collectiondump_trend_collection(slope_units, reducer, method, parameter_name); _daily_min_series_daily_series(reducer = min|max|mean). Output field slope_ft_per_yearslope_per_year + slope_units. WL trend now calls it with reducer="min", slope_units="ft/year"; analyte trend with reducer="mean", slope_units="mg/L/year".
  • New dump_mcl_exceedance_collection (well pivot + threshold compare) and dump_monitoring_recency_collection.
  • GCSResource.read_json for the MCL file.
  • die_config treats ogc_mcl_exceedance as summary mode (latest value per analyte).
  • definitions registers the three new output types — each auto-gets a job + schedule.
  • products.yaml: 4 product entries. mcl.sample.json documents the schema (real file lives in GCS).

Verification

  • All 21 persister tests pass (new coverage: analyte trend daily-mean/units, MCL exceedance flags + no-threshold case, recency active/stale/no-data).
  • Definitions load; 11 jobs total (4 new: nm_arsenic_trend, nm_nitrate_trend, nm_mcl_exceedance, nm_monitoring_recency).

Before running nm_mcl_exceedance

Upload config/mcl.json to the products bucket (schema = orchestration/config/mcl.sample.json), values in mg/L.

🤖 Generated with Claude Code

Three new data products built on the per-source asset graph:

- ogc_analyte_trend: per-well analyte concentration trend (Mann-Kendall +
  Theil-Sen, daily mean). One product per analyte; seeds nm_arsenic_trend
  and nm_nitrate_trend.
- ogc_mcl_exceedance (nm_mcl_exceedance): one feature per well flagging
  drinking-water MCL exceedances. Thresholds read at run time from
  gs://<bucket>/config/mcl.json (source of truth); see mcl.sample.json.
- ogc_monitoring_recency (nm_monitoring_recency): one feature per well
  with last-observation date, days_since_last, and active/stale status
  (water levels, stale > 365d).

Implementation:
- Generalize the trend dumper: dump_waterlevel_trend_collection ->
  dump_trend_collection(slope_units, reducer, method, parameter_name);
  _daily_min_series -> _daily_series(reducer min|max|mean). slope_ft_per_year
  -> slope_per_year + slope_units.
- New dumpers dump_mcl_exceedance_collection (pivot + threshold compare)
  and dump_monitoring_recency_collection.
- GCSResource.read_json for the MCL file; die_config treats MCL as summary
  mode; definitions registers the three output types (each gets a job +
  schedule).

Offline tests cover all three. Run nm_mcl_exceedance only after uploading
config/mcl.json to the products bucket.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown

Your pull request is automatically being deployed to Dagster Cloud.

Location Status Link Updated
die-orchestration View in Cloud Jun 28, 2026 at 09:15 PM (UTC)

jirhiker and others added 5 commits June 28, 2026 14:43
Replace the sample with the real EPA-sourced MCL file. Values in mg/L:
- arsenic 0.01, nitrate(as N) 10, fluoride 4.0, uranium 0.03 (primary)
- chloride 250, sulfate 250, tds 500 (secondary)
pH (6.5-8.5) omitted (a range, not a single MCL). Provenance + EPA
source URLs recorded in the file. Add uranium to the nm_mcl_exceedance
analyte list. Upload this file to gs://<products_bucket>/config/mcl.json.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a _schema block (explains every field), structured _source with EPA
URLs + retrieved date, an _omitted note (pH is a range), and per-analyte
units/basis/label/note. The product reads only mcl/type per analyte;
_-prefixed keys and extra fields are ignored, and the whole dict travels
into the output collection's mcl_thresholds as provenance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The exceedance test is a direct magnitude comparison, so MCL and value
must share units and basis. Document the nitrate pitfall (EPA MCL is as
N; data may be as NO3, ~4.43x) at the comparison site.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ogc_features.py had grown to ~628 lines mixing three concerns; the
statistics cluster is the one that isn't serialization. Move the daily
aggregation, qualification gate, Mann-Kendall + Theil-Sen test, the
thresholds, and the method-description text into backend/trend_stats.py
(pure analysis, lazily importing scipy/pymannkendall). ogc_features
re-exports them so importers and dump_trend_collection's default arg keep
working.

ogc_features 628 -> 506 lines (serialization only); trend_stats 143.
Add direct unit tests for the extracted module. 27 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
dg kept disappearing from the project venv on env re-resolves because it
was never declared, breaking the AGENTS.md-recommended `dg check defs`.
Declare it so `uv run dg ...` always works. Dev/CLI only — not in the
serverless requirements.txt, so the deploy PEX is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jirhiker jirhiker force-pushed the feature/trend-mcl-recency-products branch from 1f35c31 to ec1607b Compare June 28, 2026 21:14
@jirhiker jirhiker merged commit 93157c7 into feature/st2-source-datastream-link Jun 28, 2026
4 checks passed
@jirhiker jirhiker deleted the feature/trend-mcl-recency-products branch June 28, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant